Apache Hive
Apache Hive Data Warehouse framework facilitates the querying and management of large datasets residing in a distributed store/file system like Hadoop Distributed File System (HDFS). The following are a few highlights of this project:
- Hive offers a technique to map a tabular structure on to data stored in distributed storage.
- Hive supports most of the data types available in many popular relational database platforms.
- Hive has various built-in functions, types, etc. for handling many commonly performed operations.
- Hive allows querying of the data from distributed storage through the mapped tabular structure.
- Hive offers various features, which are similar to relational databases, like partitioning, indexing, external tables, etc.
- Hive manages its internal data (system catalog) like metadata about Hive Tables, Partitioning information, etc. in a separate database known as Hive Metastore.
- Hive queries are written in a SQL-like language known as HiveQL.
- Hive also allows plugging in custom mappers, custom reducers, custom user-defined functions, etc. to perform more sophisticated operations.
- HiveQL queries are executed via MapReduce. Meaning, when a HiveQL query is issued, it triggers a Map and/or Reduce job(s) to perform the operation defined in the query.
Additional Information: Home Page | Wiki | Documentation/User Guide/Reference Manual | Mailing Lists